Skip to content
This repository has been archived by the owner on Feb 12, 2024. It is now read-only.

refactor: async iterables #2547

Closed
wants to merge 13 commits into from
Closed

refactor: async iterables #2547

wants to merge 13 commits into from

Conversation

alanshaw
Copy link
Member

@alanshaw alanshaw commented Oct 17, 2019

This PR allows ipfsx to be used by calling IPFS.create(options) with { EXPERIMENTAL: { ipfsx: true } } options.

Edit: #2547 (comment)

It adds a single API method add that returns an iterator that yields objects of the form { cid, path, size }. The iterator is decorated with a first and last function so users can conveniently await on the first or last item to be yielded as per the proposal here.

In order to boot up a new ipfsx node I refactored the boot procedure to enable the following:

  1. Remove the big stateful blob "self" - components are passed just the dependencies they need to operate. Right now it is opaque as to which components require which parts of an IPFS node without inspecting the entirety of the component's code. This change makes it easier to look at a component and know what aspects of the IPFS stack it uses and consequently allows us to understand which APIs should be available at which points of the node's lifecycle. It makes the code easier to understand, more maintainable and easier to mock dependencies for unit tests.
  2. Restrict usage of subsystem APIs to appropriate lifecycle stage(s). This PR introduces an ApiManager that allows us to update the API that is exposed at any given point. It allows us to (for example) disallow ipfs.add before the node is initialized or access libp2p before the node is started. The lifecycle methods init, start and stop each define which API methods are available after they have run avoiding having to put boilerplate in every method to check if it can be called when the node is in a particular state. See What does it mean to start an IPFS node? #1438
  3. Safer and more flexible API usage. The ApiManager allows us to temporarily change APIs to stop init from being called again while it is already running and has the facility to rollback to the previous API state if an operation fails. It also enables piggybacking so we don't attempt 2 or more concurrent start/stop calls at once. See state machine, transitions and events #1061 Error: Not able to stop from state: stopping #2257
  4. Enable config changes at runtime. Having an API that can be updated during a node's lifecycle will enable this feature in the future.

FEEDBACK REQUIRED: The changes I've made here are a little...racy. They have a bunch of benefits, as I've outlined above but the ApiManager is implemented as a Proxy, allowing us to swap out the underlying API at will. How do y'all feel about that? Is there a better way or got a suggestion?

resolves #1438
resolves #1061
resolves #2257
resolves #2509
resolves #1670
refs ipfs-inactive/interface-js-ipfs-core#394

BREAKING CHANGE: IPFS.createNode removed

BREAKING CHANGE: IPFS is not a class that can be instantiated - use IPFS.create. An IPFS node instance is not an event emitter.

BREAKING CHANGE: Instance .ready property removed. Please use IPFS.create instead.

BREAKING CHANGE: Callbacks are no longer supported on any API methods. Please use a utility such as callbackify on API methods that return Promises to emulate previous behaviour.

BREAKING CHANGE: PeerId and PeerInfo classes are no longer statically exported from ipfs-http-client since they are no longer used internally.

BREAKING CHANGE: pin.add results now contain a cid property (a CID instance) instead of a string hash property.

BREAKING CHANGE: pin.ls now returns an async iterable.

BREAKING CHANGE: pin.ls results now contain a cid property (a CID instance) instead of a string hash property.

BREAKING CHANGE: pin.rm results now contain a cid property (a CID instance) instead of a string hash property.

BREAKING CHANGE: add now returns an async iterable.

BREAKING CHANGE: add results now contain a cid property (a CID instance) instead of a string hash property.

BREAKING CHANGE: addReadableStream, addPullStream have been removed.

BREAKING CHANGE: ls now returns an async iterable.

BREAKING CHANGE: ls results now contain a cid property (whose value is a CID instance) instead of a hash property.

BREAKING CHANGE: files.readPullStream and files.readReadableStream have been removed.

BREAKING CHANGE: files.read now returns an async iterable.

BREAKING CHANGE: files.lsPullStream and files.lsReadableStream have been removed.

BREAKING CHANGE: files.ls now returns an async iterable.

BREAKING CHANGE: files.ls results now contain a cid property (whose value is a CID instance) instead of a hash property.

BREAKING CHANGE: files.ls no longer takes a long option (in core) - you will receive all data by default.

BREAKING CHANGE: files.stat result now contains a cid property (whose value is a CID instance) instead of a hash property.

BREAKING CHANGE: get now returns an async iterable. The content property value for objects yielded from the iterator is now an async iterable that yields BufferList objects.

BREAKING CHANGE: stats.bw now returns an async iterable.

BREAKING CHANGE: addFromStream has been removed. Use add instead.

BREAKING CHANGE: addFromFs has been removed. Please use the exported globSource utility and pass the result to add. See the glob source documentation for more details and an example.

BREAKING CHANGE: addFromURL has been removed. Please use the exported urlSource utility and pass the result to add. See the URL source documentation for more details and an example.

BREAKING CHANGE: name.resolve now returns an async iterable. It yields increasingly more accurate resolved values as they are discovered until the best value is selected from the quorum of 16. The "best" resolved value is the last item yielded from the iterator. If you are interested only in this best value you could use it-last to extract it like so:

const last = require('it-last')
await last(ipfs.name.resolve('/ipns/QmHash'))

BREAKING CHANGE: block.rm now returns an async iterable.

BREAKING CHANGE: dht.findProvs, dht.provide, dht.put and dht.query now all return an async iterable.

BREAKING CHANGE: dht.findPeer, dht.findProvs, dht.provide, dht.put and dht.query now yield/return an object { id: CID, addrs: Multiaddr[] } instead of a PeerInfo instance(s).

BREAKING CHANGE: refs and refs.local now return an async iterable.

BREAKING CHANGE: object.data now returns an async iterable that yields Buffer objects.

BREAKING CHANGE: ping now returns an async iterable.

BREAKING CHANGE: repo.gc now returns an async iterable.

BREAKING CHANGE: swarm.peers now returns an array of objects with a peer property that is a CID, instead of a PeerId instance.

BREAKING CHANGE: swarm.addrs now returns an array of objects { id: CID, addrs: Multiaddr[] } instead of PeerInfo instances.

BREAKING CHANGE: block.stat result now contains a cid property (whose value is a CID instance) instead of a key property.

BREAKING CHANGE: bitswap.wantlist now returns an array of CID instances.

BREAKING CHANGE: bitswap.stat result has changed - wantlist and peers values are now an array of CID instances.

BREAKING CHANGE: the init option passed to the IPFS constructor will now not take any initialization steps if it is set to false. Previously, the repo would be initialized if it already existed. This is no longer the case. If you wish to initialize a node but only if the repo exists, pass init: { allowNew: false } to the constructor.

BREAKING CHANGE: removed file ls command from the CLI and HTTP API.

BREAKING CHANGE: Delegated peer and content routing modules are no longer included as part of core (but are still available if starting a js-ipfs daemon from the command line). If you wish to use delegated routing and are creating your node programmatically in Node.js or the browser you must npm install libp2p-delegated-content-routing and/or npm install libp2p-delegated-peer-routing and provide configured instances of them in options.libp2p. See the module repos for further instructions:

@achingbrain
Copy link
Member

Much excite, will digest and comment. I am 👎 on extending iterators though, convenient that the functions are, I don't think we should mess with core language features.

@alanshaw
Copy link
Member Author

N.B. this PR is NOT yet finished, I'm not sure it even runs and it has no tests, just looking for general feedback right now before I invest the last chunk of time to finish it.

I am 👎 on extending iterators though, convenient that the functions are, I don't think we should mess with core language features.

Can you elaborate? Many iterable objects in JavaScript have other functions available on them, is it really "messing" with core language features to return an iterable object that has it's own functions?

It's really beneficial to have a first/last function as it reduces boilerplate on a very common use case. I guess alternatively we could export a utility function from IPFS or mandate that people install and use their own...? I'm open to suggestions. Maybe a different API 😅?

Copy link
Member

@lidel lidel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding reducing boilerplate related to first/last, I believe that key offenders are pretty specific:

  • Adding a single thing with ipfs.add and getting "array with one result" which needs to be unpacked.
  • Adding a directory tree with ipfs.add just to get the root CID (last item on the results array)

Personally I really like the uniformity of having first/last, but if its not the best way to do it, addressing those two cases I mentioned could get us mostly there.

src/core/errors.js Outdated Show resolved Hide resolved
Comment on lines 34 to 40
const iterator = pipe(
normaliseAddInput(source),
source => importer(source, ipld, opts),
transformFile(dag, opts),
preloadFile(preload, opts),
pinFile(pin, opts)
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like pipelines like this one.
Easy to understand what needs to happen just by scanning only a few lines without going too deep into details.
Doing more of this would make it easier to onboard new contributors to the codebase.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

me too!

@daviddias
Copy link
Member

  1. Remove the big stateful blob "self" - components are passed just the dependencies they need to operate.

👏🏽 👍🏽 👏🏽 👍🏽

  1. Restrict APIs to appropriate lifecycle stage(s).

I believe the terms API and subsystem here is getting mixed. When a subsystem is not yet active, the API should not be exposed.

  1. Safer and more flexible API usage.

Nice

  1. Enable config changes at runtime.

🤔

Overall, this PR proposes a lot of really good refactors to the internals, I like where it is going :) Thank you!

@alanshaw
Copy link
Member Author

I believe the terms API and subsystem here is getting mixed.

Yes this should have read something more like "Restrict usage of subsystem APIs to appropriate lifecycle stage(s)"

When a subsystem is not yet active, the API should not be exposed.

I think it's better to throw a meaningful error explaining why the API cannot be accessed rather than the user getting undefined is not a function. The former is what happens in this PR.

Overall, this PR proposes a lot of really good refactors to the internals, I like where it is going :) Thank you!

❤️

@achingbrain
Copy link
Member

I think it's better to throw a meaningful error explaining why the API cannot be accessed rather than the user getting undefined is not a function

Always!

Can you elaborate?

Sure!

Many iterable objects in JavaScript have other functions available on them

The most obvious example that springs to mind is readable streams in node, but when this sort of thing happens I don't think the intention is for them to be consumed as streams and async iterators, rather streams or async iterators. So the interface a result presents should only do one thing and not leave the user in the position of:

Well, I got this Async Iterator from this module and I got this Async Iterator from that module, why can I only call this function on this one but not that one?

..when in fact they haven't iterated asyncly over the result at all.

Better to just let them pass the result to a function that extracts the first result or the last or the nth or whatever they need. Chances are they have a growing collection of modules for dealing with async iterators already anyway - piping, making things duplex, etc.

lidel added a commit to ipfs/community that referenced this pull request Nov 4, 2019
alanshaw pushed a commit to ipfs/community that referenced this pull request Nov 8, 2019
* Update CONTRIBUTING_JS.md

add section on error codes
Ref. ipfs/js-ipfs#2547 (comment)

* Update CONTRIBUTING_JS.md

Co-Authored-By: Teri Chadbourne <terichadbourne@users.noreply.github.com>

* docs: add Error Codes example

License: MIT
Signed-off-by: Marcin Rataj <lidel@lidel.org>

* Update CONTRIBUTING_JS.md

Co-Authored-By: Teri Chadbourne <terichadbourne@users.noreply.github.com>
@alanshaw
Copy link
Member Author

@achingbrain @hugomrdias any chance you can review this PR?

Copy link
Member

@achingbrain achingbrain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am still 👎 on extending iterators as it encourages people to not iterate and if they're not iterating, maybe iterators are the wrong thing to return.

That said I'd like to start playing with this API so gogogogogogo!

@alanshaw alanshaw mentioned this pull request Dec 9, 2019
57 tasks
@alanshaw alanshaw changed the title feat: EXPERIMENTAL ipfsx API - boot procedure and add API method refactor: async iterators Dec 10, 2019
Alan Shaw added 3 commits December 10, 2019 11:08
This PR allows ipfsx to be used by calling `IPFS.create(options)` with `{ EXPERIMENTAL: { ipfsx: true } }` options.

It adds a single API method `add` that returns an iterator that yields objects of the form `{ cid, path, size }`. The iterator is decorated with a `first` and `last` function so users can conveniently `await` on the first or last item to be yielded as per the [proposal here](https://github.com/ipfs-shipyard/ipfsx/blob/master/API.md#add).

In order to boot up a new ipfsx node I refactored the boot procedure to enable the following:

1. **Remove the big stateful blob "`self`" - components are passed just the dependencies they need to operate.** Right now it is opaque as to which components require which parts of an IPFS node without inspecting the entirety of the component's code. This change makes it easier to look at a component and know what aspects of the IPFS stack it uses and consequently allows us to understand which APIs should be available at which points of the node's lifecycle. It makes the code easier to understand, more maintainable and easier to mock dependencies for unit tests.
1. **Restrict APIs to appropriate lifecycle stage(s).** This PR introduces an `ApiManager` that allows us to update the API that is exposed at any given point. It allows us to (for example) disallow `ipfs.add` before the node is initialized or access `libp2p` before the node is started. The lifecycle methods `init`, `start` and `stop` each define which API methods are available after they have run avoiding having to put boilerplate in every method to check if it can be called when the node is in a particular state. See #1438
1. **Safer and more flexible API usage.** The `ApiManager` allows us to temporarily change APIs to stop `init` from being called again while it is already running and has the facility to rollback to the previous API state if an operation fails. It also enables piggybacking so we don't attempt 2 or more concurrent start/stop calls at once. See #1061 #2257
1. **Enable config changes at runtime.** Having an API that can be updated during a node's lifecycle will enable this feature in the future.

**FEEDBACK REQUIRED**: The changes I've made here are a little...racy. They have a bunch of benefits, as I've outlined above but the `ApiManager` is implemented as a `Proxy`, allowing us to swap out the underlying API at will. How do y'all feel about that? Is there a better way or got a suggestion?

resolves #1438
resolves #1061
resolves #2257
refs #2509
refs #1670

License: MIT
Signed-off-by: Alan Shaw <alan.shaw@protocol.ai>
@alanshaw
Copy link
Member Author

alanshaw commented Dec 10, 2019

Ok so I'm changing tact with this PR - people seem generally happy with the boot process approach and seeing as we now have a fully refactored ipfs-http-client with a new API and fully refactored interface-ipfs-core test suite that tests against the new API I'm going to use this PR as a base to PR against and bubble these changes in js-ipfs.

This means we won't have a EXPERIMENTAL flag for ipfsx - as I think this can get sorted pretty quickly.

PS. I've also removed the .first() and .last() iterator utils.

@alanshaw alanshaw marked this pull request as ready for review December 10, 2019 11:41
BREAKING CHANGE: `ipfs.add` results now contain a `cid` property (a [CID instance](https://github.com/multiformats/js-cid)) instead of a string `hash` property.

refs ipfs-inactive/interface-js-ipfs-core#394
@alanshaw alanshaw changed the title refactor: async iterators refactor: async iterables Dec 12, 2019
@alanshaw alanshaw requested a review from dirkmc December 12, 2019 15:33
@daviddias
Copy link
Member

All 👍🏽 for all the breaking changes, just unsure about:

image

Most users will fall flat on their faces by not having a convenient method. We've been there in the past, that is why we added these auxiliary methods in the first place.

@alanshaw
Copy link
Member Author

alanshaw commented Dec 13, 2019

Note taken. Just one question - was it problematic because there wasn't a way to do it without rolling your own?

Both js-ipfs and js-ipfs-http-client export urlSource and globSource to give users the power to perform what addFromURL and addFromFs were doing without expanding the API to deal with specific input types/locations.


Screenshot 2019-12-13 at 13 41 35

Screenshot 2019-12-13 at 13 41 20


This will be on the README here, but it should definitely also be in the interface-ipfs-core docs for ipfs.add. I'll make a note.

This keeps our API surface area small and will maybe encourage the community to create other *Source sources! - torrentSource, ftpSource?

Have I convinced you this is ok?

P.S. thank you for taking the time to review ❤️

Alan Shaw added 2 commits December 13, 2019 17:38
* refactor: convert pubsub API to async/await

* refactor: switch wording to not enabled
@alanshaw alanshaw added awesome endeavour exp/expert Having worked on the specific codebase is important P0 Critical: Tackled by core team ASAP labels Dec 16, 2019
@daviddias
Copy link
Member

Oh #2547 (comment) nice! Looks super simple to me :) Thank you!

@alanshaw
Copy link
Member Author

superseded by #2726

@alanshaw alanshaw closed this Jan 24, 2020
@alanshaw alanshaw deleted the feat/ipfsx-boot-and-add branch January 24, 2020 11:38
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
awesome endeavour exp/expert Having worked on the specific codebase is important P0 Critical: Tackled by core team ASAP
Projects
None yet
4 participants